Identifying Information Units for Multiple Document Summarization

نویسندگان

Seamus Lyons

Dan J. Smith

چکیده

Multiple document summarization is becoming increasingly important as a way of reducing information overload, particularly in the context of the proliferation of similar accounts of events that are available on the Web. Removal of similar sentences often results in either partial or unwanted elimination of important information. In this paper, we present an approach to split sentences into their component clauses and use these clauses to produce comprehensive summaries of multiple documents describing particular events. Detailed analysis of all clauses and clause boundaries may be complex and computationally expensive. Our rule-based approach demonstrates that it is possible to achieve high accuracy in reasonable time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Towards Multidocument Summarization by Reformulation: Progress and Prospects

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is uniqu...

متن کامل

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning

We present a new composite similarity metric that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units. Several potential features are investigated and an optireal combination is selected via machine learning. We discuss a more restrictive definition of similarity than traditional, document-level and information retrieval-ori...

متن کامل

Mr&mr-Sum: Maximum Relevance and Minimum Redundancy Document Summarization Model

We have presented an approach to automatic document summarization. In the proposed approach, text summarization is modeled as a quadratic integer-programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units...

متن کامل

Improving the Performance of the Random Walk Model for Answering Complex Questions

We consider the problem of answering complex questions that require inferencing and synthesizing information from multiple documents and can be seen as a kind of topicoriented, informative multi-document summarization. The stochastic, graph-based method for computing the relative importance of textual units (i.e. sentences) is very successful in generic summarization. In this method, a sentence...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Identifying Information Units for Multiple Document Summarization

نویسندگان

چکیده

منابع مشابه

A survey on Automatic Text Summarization

Towards Multidocument Summarization by Reformulation: Progress and Prospects

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning

Mr&mr-Sum: Maximum Relevance and Minimum Redundancy Document Summarization Model

Improving the Performance of the Random Walk Model for Answering Complex Questions

عنوان ژورنال:

اشتراک گذاری